In [ ]:
# Make it Python2 & Python3 compatible
from __future__ import print_function
The Spark kernel provided in the Notebook automatically creates an SQLContext object called sqlContext
(also with the sqlCtx
alias), as it did for the SparkContext
object in sc
. Let's take a look at it:
In [ ]:
?sqlContext
We can inspect some of the SqlContext
properties:
In [ ]:
print(dir(sqlContext))
The first action executed on a sqlContext
in a Spark session will take extra time, since Spark uses this moment to initialize all the Spark SQL scaffolding.
In [ ]:
# Create a tiny dataset
sqlContext.range(1, 7, 2).collect()
In [ ]:
if sc.version.startswith('2'):
print( spark.version )
print( type(spark) )
In [ ]:
data = [ ('Alice', 44), ('Bob', 32), ('Charlie',62) ]
df = spark.createDataFrame( data, schema=('name','age'))
In [ ]:
df.toPandas()
In [ ]: